Quote Extraction and Attribution from Norwegian Newspapers
نویسندگان
چکیده
We present ongoing work that, for the first time, seeks to extract and attribute politicians’ quotations from Norwegian Bokmål newspapers. Our method – using a statistical dependency parser, a few regular expressions and a look-up table – gives modest recall (a best of .570) but very high precision (.978) and attribution accuracy (.987) for a restricted set of speaker names. We suggest that this is already sufficient to support some kinds of important social science research, but also identify ways in which performance could be improved.
منابع مشابه
Ord i Dag: Mining Norwegian Daily Newswire
We present Ord i Dag, a new service that displays today's most important keywords. These are extracted fully automatically from Norwegian online newspapers. Describing the complete process, we provide an entirely disclosed method for media monitoring and news summarization. For keyword extraction, a reference corpus serves as background about average language use, which is contrasted with the c...
متن کاملA Sequence Labelling Approach to Quote Attribution
Quote extraction and attribution is the task of automatically extracting quotes from text and attributing each quote to its correct speaker. The present state-of-the-art system uses gold standard information from previous decisions in its features, which, when removed, results in a large drop in performance. We treat the problem as a sequence labelling task, which allows us to incorporate seque...
متن کاملAutomatically Detecting and Attributing Indirect Quotations
Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation ex...
متن کاملA Study of Information Extraction Tools for Online English Newspapers (PDF): Comparative Analysis
Information retrieval is the task of retrieving relevant and useful information from e-newspapers. Electronic newspapers are electronic replicas of traditional newspapers. E-newspapers are becoming increasingly popular because of the ease and convenience in accessing them. Newspapers are the source of timely information. These are the documents comprising news items and several independent info...
متن کاملExamining the Impact of Coreference Resolution on Quote Attribution
Quote attribution is the task of identifying the speaker of each quote within a document. While recent research has established large-scale corpora for this task, these corpora are not yet consistent in the way they handle candidate speakers, and many of the reported results rely on gold standard annotations of both entities and coreference chains. In this work we evaluate three quote attributi...
متن کامل